{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Interactive Web application with Dash\n",
    "\n",
    "Authors: Prof. med. Thomas Ganslandt <Thomas.Ganslandt@medma.uni-heidelberg.de> <br>\n",
    "and Kim Hee <HeeEun.Kim@medma.uni-heidelberg.de> <br>\n",
    "Heinrich-Lanz-Center for Digital Health (HLZ) of the Medical Faculty Mannheim <br>\n",
    "Heidelberg University\n",
    "\n",
    "This tutorial is prepared for TMF summer school on 03.07.2019\n",
    "\n",
    "## Prerequisite: MIMIC-III files locally\n",
    "\n",
    "You should place the following MIMIC-III data files in the data/ subfolder:\n",
    "\n",
    "* D_LABITEMS.csv\n",
    "* LABEVENTS.csv\n",
    "* PRESCRIPTIONS.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Dash\n",
    "* https://dash.plot.ly/gallery <br>\n",
    "* Dash is a Python framework for creating data-driven web applications <br>\n",
    "* Dash apps are written on top of Flask, Plotly, and React\n",
    " * Flask is a Python web framework\n",
    " * Plotly is specifically a charting library built on top of D3.js\n",
    " * React is a JavaScript library for building user interfaces maintained by Facebook and a community "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Case Study 2: Labitems Trend Visualization\n",
    "[Trend analysis](https://en.wikipedia.org/wiki/Trend_analysis) is the widespread practice of collecting information and attempting to spot a pattern. This case study will illustrate a drug reaction of a sepsis patient. This case study tracks the biomarker and prescription history of patient 41976. It visualizes the relation between two key biomarkers of sepsis (White Blood Cells and Neutrophils) and \n",
    "\n",
    "* '41976' patient is choosen for this case study because this patient contains most and interesting records among other sepsis patients '10006', '10013', '10036', '10056', '40601'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Import Python pakages (1/6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# # Dash packages installation\n",
    "# !conda install -c conda-forge dash-renderer -y\n",
    "# !conda install -c conda-forge dash -y\n",
    "# !conda install -c conda-forge dash-html-components -y\n",
    "# !conda install -c conda-forge dash-core-components -y\n",
    "# !conda install -c conda-forge plotly -y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "import dash\n",
    "import dash_core_components as dcc\n",
    "import dash_html_components as html\n",
    "import flask\n",
    "from dash.dependencies import Input, Output\n",
    "import plotly.graph_objs as go\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "pd.set_option('display.max_columns', 999)\n",
    "import pandas.io.sql as psql"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Data collection (2/6)\n",
    "* Query `d_labitems` table (Dictionary table for mapping)\n",
    "* Query `labevents` table (History of the labitem order)\n",
    "* Join two tables\n",
    "* Query `prescriptions` table (History of the prscription order)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "d_lab = pd.read_csv(\"data/D_LABITEMS.csv\")\n",
    "d_lab.columns = map(str.lower, d_lab.columns)\n",
    "d_lab.drop(columns = ['row_id'], inplace = True)\n",
    "\n",
    "lab = pd.read_csv(\"data/LABEVENTS.csv\")\n",
    "lab.columns = map(str.lower, lab.columns)\n",
    "lab = lab[lab['subject_id'] == 41976]\n",
    "lab.drop(columns = ['row_id'], inplace = True)\n",
    "\n",
    "lab = pd.merge(d_lab, lab, on = 'itemid', how = 'inner')\n",
    "print(lab.columns)\n",
    "lab[['subject_id', 'hadm_id', 'itemid', 'label', 'value']].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "presc = pd.read_csv(\"data/PRESCRIPTIONS.csv\")\n",
    "presc.columns = map(str.lower, presc.columns)\n",
    "presc = presc[presc['subject_id'] == 41976]\n",
    "presc.drop(columns = ['row_id'], inplace = True)\n",
    "print(presc.columns)\n",
    "presc[['subject_id', 'hadm_id', 'icustay_id', 'drug']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "###  Data preparation for labevents table (3/6)\n",
    "* Convert data type to datetime and extract only year value "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "lab['charttime'] = pd.to_datetime(lab['charttime'], errors = 'coerce')\n",
    "lab.sort_values(by='charttime', inplace=True)\n",
    "lab.set_index('charttime', inplace = True)\n",
    "lab.head(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "###  Data preparation for prescriptions table (4/6)\n",
    "* Filter conditions:\n",
    " * unit: 'mg'\n",
    " * antibiotics medicines: ('Vancomycin','Meropenem','Levofloxacin')\n",
    "* Contruct a normalized dose column\n",
    "* Convert data type to datetime and extract only year value "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "presc['dose_val_rx'] = pd.to_numeric(presc['dose_val_rx'], errors = 'coerce')\n",
    "presc = presc[presc['dose_unit_rx']=='mg']\n",
    "presc = presc[presc['drug'].isin(['Vancomycin','Meropenem','Levofloxacin'])]\n",
    "\n",
    "temp_df = pd.DataFrame()\n",
    "for item in presc.drug.unique():\n",
    "    temp = presc[presc['drug'].str.contains(item)]\n",
    "    temp['norm_size'] = temp['dose_val_rx'] / temp['dose_val_rx'].max()\n",
    "    temp_df = temp_df.append(temp)\n",
    "presc = pd.merge(presc, temp_df, on=list(presc.columns))\n",
    "\n",
    "presc['startdate'] = pd.to_datetime(presc['startdate'], errors = 'coerce')\n",
    "presc.sort_values(by='startdate', inplace=True)\n",
    "presc.set_index('startdate', inplace = True)\n",
    "presc.head(1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "###  Create a structure and presentation of your web with HTML and CSS (5/6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "list_patient = ['41976']\n",
    "list_biomarker = ['White Blood Cells', 'Neutrophils']\n",
    "list_drug = ['Vancomycin','Meropenem','Levofloxacin']\n",
    "\n",
    "# stylesheets = ['./resources/bWLwgP.css']\n",
    "app = dash.Dash()\n",
    "\n",
    "app.layout = html.Div([\n",
    "\n",
    "    dcc.Dropdown(\n",
    "        id = 'patient',\n",
    "        value = '41976',\n",
    "        multi = False,\n",
    "        options = [{'label': i, 'value': i} for i in list_patient],\n",
    "    ),\n",
    "    dcc.Dropdown(\n",
    "        id = 'biomarker',\n",
    "        value = 'White Blood Cells',\n",
    "        multi = False,\n",
    "        options = [{'label': i, 'value': i} for i in list_biomarker],\n",
    "    ),\n",
    "    dcc.Dropdown(\n",
    "        id = 'drug',\n",
    "        value = ['Vancomycin'],\n",
    "        multi = True,\n",
    "        options = [{'label': i, 'value': i} for i in list_drug],\n",
    "    ),\n",
    "    dcc.Graph(id = 'graph'),\n",
    "])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "###  Define the reactive behavior with Python (6/6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "@app.callback(Output('graph', 'figure'), \n",
    "              [Input('patient', 'value'),\n",
    "               Input('biomarker', 'value'),\n",
    "               Input('drug', 'value')])\n",
    "def update_graph(patient, biomarker, drug):\n",
    "    traces = []\n",
    "    temp_l = lab[lab['subject_id'].astype(str) == patient]\n",
    "    temp_p = presc[presc['subject_id'].astype(str) == patient]\n",
    "    temp_min = 0\n",
    "    \n",
    "    item = biomarker\n",
    "    temp = temp_l[temp_l['label'] == item]\n",
    "    temp_min = float(temp.value.astype(float).min())\n",
    "    trace = go.Scatter(\n",
    "                x = temp.index,\n",
    "                y = temp.value,\n",
    "                name = item,\n",
    "                mode = 'lines+markers',\n",
    "            )\n",
    "    traces.append(trace)\n",
    "        \n",
    "    for i, item in enumerate(drug):\n",
    "        temp = temp_p[ temp_p['drug'] == item]\n",
    "        trace = go.Scatter(\n",
    "                    x = temp.index,\n",
    "                    y = np.ones((1, len(temp)))[0] * temp_min - i - 1,\n",
    "                    name = item,\n",
    "                    mode = 'markers',\n",
    "                    marker = {\n",
    "                        'size': temp.norm_size * 10\n",
    "                    }\n",
    "                )\n",
    "        traces.append(trace)\n",
    "    \n",
    "    layout = go.Layout(\n",
    "        legend = {'x': 0.5, 'y': -0.1, 'orientation': 'h', 'xanchor': 'center'},\n",
    "        margin = {'l': 300, 'b': 10, 't': 10, 'r': 300},\n",
    "        hovermode = 'closest',\n",
    "    )\n",
    "    return {'data': traces, 'layout': layout}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "app.run_server(port = 8050)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Takeaway\n",
    "* Python is excellent for data science\n",
    " * Easy to analyze and visualize data\n",
    " * Quickly develop a data-driven web application"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## External Resources (1/2)\n",
    "* Introductory course of Python as a data analysis tool\n",
    " * Getting and Cleaning Data (Johns Hopkins University)\n",
    " * https://www.coursera.org/learn/data-cleaning#\n",
    "* Introductory course of Data Science \n",
    " * Foundations of Data Science — Spring 2016 (Berkeley University) \n",
    " * https://data-8.appspot.com/sp16/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## External Resources (2/2)\n",
    "* Docker\n",
    " * https://docs.docker.com/get-started/\n",
    " * deploy containerized instances\n",
    "* Kaggle: the largest and most diverse data community in the world\n",
    " * Competitions and take an advantage of using “Kernels”\n",
    " * www.kaggle.com/competitions\n",
    " * Example kernels:\n",
    "   * Can you improve lung cancer detection? (https://goo.gl/MV01o3)\n",
    "   * Transforming How We Diagnose Heart Disease (https://goo.gl/b9Rta1)\n",
    "   * Predict West Nile virus in mosquitos across the city of Chicago (https://goo.gl/VdVKtF)\n",
    "   * and more\n",
    "* Visualization examples\n",
    " * Matplotlib (https://matplotlib.org/gallery/index.html)\n",
    " * Plotly (https://plot.ly/python/)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Question?\n",
    "Authors: Prof. med. Thomas Ganslandt <Thomas.Ganslandt@medma.uni-heidelberg.de> <br>\n",
    "and Kim Hee <HeeEun.Kim@medma.uni-heidelberg.de> <br>\n",
    "\n",
    "Heinrich-Lanz-Center for Digital Health (HLZ) of the Medical Faculty Mannheim <br>\n",
    "Heidelberg University\n",
    "\n",
    "This is a part of a tutorial prepared for TMF summer school on 03.07.2019"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "dataviz",
   "language": "python",
   "name": "dataviz"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "nbTranslate": {
   "displayLangs": [
    "*"
   ],
   "hotkey": "alt-t",
   "langInMainMenu": true,
   "sourceLang": "en",
   "targetLang": "fr",
   "useGoogleTranslate": true
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
